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Brain-computer interface (BCI) usually uses Electroencephalogram (EEG) 
signals as an intermediate device to drive external devices directly from the 
brain. The development of BCI capabilities is carried out by involving 
multivariable EEG signals as movement commands. EEG signals are 
recorded using multi-channel, enriching information if it uses the suitable 


method and architecture. This research proposed a_ two-dimensional 
convolutional neural networks (CNN) method to recognize multi-channel 
EEG signals. The vertical dimension is the channel, while the horizontal is 
the signal sequence. Hence, the signal is connected with the information 
time series of the same channel and between channels simultaneously. BCI 
was arranged with multivariable signals, specifically motor imagery and 
emotion. Both variables have different characteristics, and the information is 
from different channels. Therefore, it needs multiple CNNs to recognize the 
two variables in the EEG signal. The experiment showed that the accuracy 
of multiple 2D-CNN increased to 94.62% compared to 85.44% of single 2D 
CNN. Multiple 2D-CNN gave accuracy from 82.04% to 94.62% more than 
multiple 1D-CNN. 2D-CNN makes the channel extraction perfect into 
vectors to maintain the signal sequence. Signal extraction is essential, so the 
used Wavelet filter upgraded accuracy from 73.75% to 94.62%. 
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1. INTRODUCTION 

Brain-computer interface (BCI) is a computer and machine interaction technology that allows 
moving external devices through brain commands without involving gestures, muscles, and voice. BCI is a 
way to communicate human brain and computers directly. Technology development can help someone with a 
physical disability drive an external device. BCI has gained more application challenges [1], so it has given 
attention and recognition in medical rehabilitation to help post-stroke patients [2], wheelchairs [3], and 
neuromuscular defects [4]. BCI was developed as a prosthetic hand in the rehabilitation process for stroke 
survivors [5]. This technology is also used for practicality or entertainment, such as games [6], movement 
imaging of games [7], real-time robots [8], education [9], military, and other applications [10]. BCI works by 
receiving brain command input, which intermediate appliances capture to drive external devices. An 
electroencephalogram (EEG) signal is often used as intermediate translate brain command to control an 
external device. BCI actions often use EEG signals due to their high time resolution, relative convenience, 
low cost, and effectiveness compared to other methods [11]. 

Commands executed by BCI involve more variables captured from the EEG signal. BCI can be driven 
by conscious command through imagined and is called motor imagery (MI). Previous research used motor 
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imagery to identify hand movements, imagination [12], control mouse [13], and external devices such as 
wheelchairs [14]. Brain motor imagery includes, among others, such as arithmetic, imagination, movement, and 
singing [15]. There are steady-state visually evoked potentials (SSVEP), and P300 evoked potentials. MI is 
actively so can control the equipment as they imagine. While SSVEP and P300, although the high-speed 
transmission speed, need additional equipment, which is less convenient in implementing BCI [16]. 

Emotion identification was a coupling connection [17], so it can drive BCI [18]. There are 
exogenous and endogenous BCI variables. Exogenous machines need to use external conditions for brain 
stimulation to produce a specific response, such as steady-state visual evoking or SSVEP [19]. Emotion 
includes an exogenous part of the neuropsychological in BCI development [20]. Endogenous is such as motor 
imagery with imagining something as a command. Emotion and physiological states are related to feelings 
[21] and are essential for daily interactions and physical motor activities [22]. Alpha and Beta waves 
represented neutral and happy emotions in the 8-30Hz range [23]. In several previous studies, MI induced 
emotion [24]. BCI with motor imagery outperformed emotions stimulating motor imagery [24]. However, 
involving emotions can enrich BCI control. Moreover, electrical activity in the brain also occurs 
unconsciously, such as emotions, to induce motor imagery [24], so combining the two also becomes a brain 
command [25]. Others used multivariable commands such as motor imagery and focus variables [26]. 

The EEG signal used for BCI control can be single-channel or multi-channel. Usually, a single 
channel is arranged with a simple stimulation such as a blink [27] or two relaxing actions or not [6]. This 
research used BCI of EEG multi-channels. At the same time, it used multivariable EEG signals, specifically 
motor imagery and emotion. Both variables come from EEG signals of different channels and different 
patterns and characteristics. Separating into different networks will save memory and increase performance. 
Furthermore, emotions that induce motor imagery become a class that engages both. Therefore, it needs 
multiple networks. 

Convolutional neural networks (CNNs) and recurrent neural networks (RNN) are often used to 
identify commands through EEG signals in BCI. CNN is good at extracting features automatically and 
addressing various tasks end-to-end. RNN is good at learning sequence data like EEG signals to hand 
movement imagination [12]. Previous studies used the RNN in identifying motor imagery induced by 
emotion [28], motor imagery and focus used RNN gave an accuracy of 77.88% [26]. Each variable that is 
processed by each network comes from multi-channels. Therefore, choosing a suitable model to 
simultaneously maintain signal connectivity between channels and time sequence signals in the same 
channel. The 2D CNN architecture allows signal connectivity between channels and across time 
simultaneously. The result of feature extraction using 2D CNN becomes the input of the time sequence using 
RNN. Therefore, it is called Spatio-temporal. A previous study processed each channel of the Parallel 
Sequence-Channel Projection 2D-CNN EEG signal with an accuracy of 95.96% and 96.24% [29]. Selection 
of the correct number of channels increased the average accuracy by about 20% [30]. Therefore, the EEG 
signal from multiple channels can be treated as spatial in the channel direction and time sequence as 
temporal, thus giving an emotion recognition accuracy of 74.4% and 71.4%, respectively [30]. 

Motor Imagery is characterized by wave activity in the Mu frequency band due to the 
desynchronization of rhythm patterns [31], usually from the sensory-motor cortex or Fp5 and Fp6 channels 
(from the dataset used). Several studies have shown that Beta bands also have the potential for this event 
[32]. It has also been shown that frequency bands are often subject-specific and can vary slightly for the 
subject [33]. Mu and Beta waves in the 8-30Hz band [13]. While neutral and joyful emotions are in Alpha 
and Beta with 8-30 Hz [25]. 

A frequency filter is needed to direct the EEG signal pattern of each variable. Filters focused on 
small data in classification to improve performance [34]. Some studies used fast fourier transform (FFT) 
[35]. In comparison, other studies use Wavelet [31]. Wavelets have advantages in decomposition for non- 
stationary data such as EEG signals. The Wavelet filters signals in the Alpha, Mu, and Beta frequency bands. 
Previous research used to identify motor imagery and emotion variables, with an accuracy of 90% [25]. 

This research used EEG signals through motor imagery and emotion variables as the BCI control. 
The proposed method used multiple networks that separate each variable with different characteristics and 
channels. EEG signals are filtered using Wavelets and are extracted spatial-temporal using 2D-CNN. In 
addition to saving memory in recognizing patterns, separating the network, and following a combination of 
the two variables. Previous studies yielded higher accuracy multi CNN than single networks [36]. 


2. RESEARCH METHOD 

BCI commands can pass EEG signal information with one or more variables. Motor imagery is 
when the brain imagines specific movements, often used in BCI actions. During the motor imagery, Mu and 
Beta waves are desynchronized so that the two waves at a frequency of 8-30Hz represent the motor imagery 
pattern. Emotions often occur in an unconscious state, so they inevitably induce BCI. Therefore, motor 
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imagery and emotion as BCI action variables make up eight classes, as shown in Figure 1. For this purpose, 
the EEG signal is filtered using Wavelet first. The two variables are processed with separate networks so that 
the action combines the two classes. The details of each process are described next. It is why the architecture 


is called multivariable BCI of multi-channel EEG. Using two-dimensional CNN or 2D CNN facilitates the 
processing of multi-channel EEG signals. 
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Figure 1. Multi-channel EEG signal in multivariable BCI 


The computational model like Figure 1. EEG signals as training data were taken from 30 volunteers 
from eight classes, with emotions: happy and neutral, and motor imagery: forward, left, right, and stop. BCI 
command every four seconds, so each data during this time for training and testing. 


2.1. EEG datasets 


Training and validation data were obtained from recording EEG signals for four seconds from 30 
people aged 18-25 years in healthy, cooperative settings so that not reviewed variables could be ignored. 
EEG instrument used Emotiv Epoch from 14 channels with 128Hz sampling frequency [25]. There are two 
classes of emotion as stimulation: neutral and happy. Four motor imagery classes are forward, stop, right, and 
left. Data were taken by 30 subjects x 8 classes x 6 segments or 1440 sets-the instruction is shown in Figure 2. 


Neutral Emotion Break Happy Emotion 


Emotion Stimulation 


Instructions 1 Instructions 2 Instructions 3 Instructions 4 
0 60 90 120 150 180 


Figure 2. BCI dataset with instruction 


2.2. Wavelet filter 


The digital wavelet transform allows signals to be represented in the time domain and eliminates 
unwanted frequency bands. A wavelet is a convolution signal with a function defined in (1). The Wavelet 
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transforms the signal by decomposition and reconstruction. Decomposition filters a specific frequency signal 
by dividing it into two parts, the low-frequency approximation (2) and the detail representing the high 


frequency (3), as shown in Figure 3. The reconstruction combines the signal back into its time domain [37]. 


Vox(n) = Tov (=) (1) 
Yow (ke) = Lnx(n).g(n—k) (2) 
Ynign(k) = Lnx(n). h(n — k) 3) 


Where g(n) is the coefficient of approximation, h(n) is the details coefficient, x(n) is the original 
EEG signal, k is the value of shifting, and n is the sequence. In Wavelet, several kernels exist, i.e., Symlet, 
Daubechies, and Haar. Previous studies used Daubechies (Db4) to be suitable for asymmetric signals [38]. 
The decomposition is shown in Figure 3. 


Figure 3. The wavelet of EEG signal 128 sampling frequency and filter 8-30 Hz 


2.3. Multiple CNN: Motor imagery and emotion networks 

CNNs can work with one, two, and three-dimensional inputs. Although spatial or two-dimensional 
CNN is widespread, previous studies have converted the EEG signal into time-frequency before using 
CNN [39]. CNN has two layers, namely feature extraction and classification layer. The feature extraction is 
convolution, rectifier activation function, and pooling layers, while the fully connected layer manages the 
classification task [40]. The use of CNN also requires the adjustment of parameters, which determine 
performance by reducing or minimizing complexity [40]. The use of convolution and activation functions 
resembles the non-linearity of network computing [39]. Rectified linear unit (ReLU) is often used. Then Max 
Pooling makes the network coarser without losing the pattern to control overfitting [41]. The results of the 
extraction process can be seen in Figure 4. There are two networks, particularly CNN architecture for 
emotion in Figure 4(a) and Motor Imagery architecture in Figure 4(b). So, the 2D-CNN architecture reduces 
the 512 points of the Wavelet results to 92 points. 
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Figure 4. Feature extraction and identification model using 2D CNN from, (a) emotion and (b) motor imagery 
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In the classification layer, the fully connected neurons. The architecture that is often used is multi-layer 
perceptron (MLP). Other studies use the support vector machine [42]. The often-used algorithm is 
backpropagation, which works for weights during training [25]. Accuracy is calculated as the number of 
correctly identified data. In addition, loss is the difference between the target and the computational output 
using the cross-entropy function to optimize the parameter [43], with (4). 


Loss = — ft; log(yi) (4) 


Where, C is the number of class labels, t; is the target value, and y; is the actual output. 


3. RESULTS AND DISCUSSION 

Experiments in this study were carried out in several parts, mainly the effect of Wavelet filter, using 
multiple networks, and spatial-temporal networks with 2D-CNN. There are 1440 data sets for the 
development and validation model. Each subject was given instructions according to the imaginary 
movement and was stimulated by that specific induced emotion. The dataset was divided into 80% or 1152 
for model development or training and 20% or 288 for validation data. The weight correction technique in 
learning also affects the performance of the model, so it needs to be tested with variations of adaptive 
moment estimation (Adam), adaptive learning estimation (AdaDelta), and stochastic gradient descent (SGD). 

The overall performance of EEG signal identification for multiple BCI variables using multiple 2D- 
CNN is shown in Table 1. That result used validation data with accuracy and Loss value as performed. Based 
on the hypothesis and literature review, four parts are tested in the experiment. First, the effect of multiple 
networks with two EEG signal variables (emotion-induced motor imagery) was compared with single 
networks. The second experiment is the effect of CNN configuration with spatial-temporal involvement 
compared with one-dimensional CNN. Third, it is necessary to examine the effect of Wavelet as an EEG 
signal filter on performance using multiple 2D-CNN. Meanwhile, it is also necessary to experiment with 
weight correction techniques, viz Adam, SGD, and AdaDelta. 


Table 1. The performance of multiple CNN of multivariable 


Accuracy (%) Loss 

Pre-Processing Optimizing 2D CNN 1D CNN 2D CNN 1D CNN 
(Wavelet) Technique Single Multiple Multiple Single Multiple Multiple 
Networks Networks Networks Networks Networks Networks 

Adam 85.44 94.62 82.04 0.848 0.110 1.356 

With AdaDelta 83.76 92.37 80.88 0.905 0.167 1.269 

SGD 80.11 90.89 81.98 0.877 0.205 1.290 

Adam 73.75 74.04 69.25 1.208 1.120 1.356 

Without AdaDelta 771.88 75.95 70.12 1.110 0.998 1.269 

SGD 76.98 76.12 71.63 1.427 1.105 1.290 


In Table 1, the Wavelet filter reduced significant signal components from 512 to 92 points without 
losing important information. It is seen that Wavelet extraction is the most important, thus giving an increase 
in accuracy of 28%. It confirms that this process makes it easier for the system to study the EEG signal 
pattern, which has eight classes of both variables. Meanwhile, using a two-dimensional CNN also affects 
mapping signals in spatial-temporal. The 2D CNN method provided the flexibility of a connected system 
between channels and sequences simultaneously. It increased the accuracy by 15.33%. The use of multiple 
networks also significantly increased accuracy from 85.44 % to 94.62% or 10.74%. The Adam weight 
correction technique provided the best accuracy. The other methods, specially AdaDelta and SGD, are 
relatively few. The differences in the three factors are shown in Figures 5-7 of accuracy and Loss value. 

The result showed that using multiple networks increased accuracy and provided stability and faster 
convergence, as shown in Figure 5. Figure 5(a) shows accuracy in 300 epochs, and Figure 5(b) shows the 
Loss value. Each network learns to recognize patterns in the same characteristics, either in the contained 
wave or from a different channel. The three weight correction techniques provide performance that is not 
much different at a steady state. However, the Adam model has the capability of directional weight correction 
to improve accuracy, and a decrease in loss value can occur quickly. Compared with the other two models, 
they tend to be slow due to corrections with random parameters. 

While testing the CNN configuration, the accuracy increased higher in 2D-CNN than in 1D-CNN, 
as in Figure 6. Figure 6(a) shows accuracy in 300 epochs, and Figure 6(b) shows the Loss value-likewise, the 
decrease in Loss values from all weight correction models. However, Adam steadily improved their 
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performance at the steady-state and stability stages. When the experiment with CNN configuration, both use 
multiple networks in order to be able to test the most significant effect, whether from multiple versus single 
or using spatial-temporal CNN, it can be seen that the second method provided an increase in accuracy, 
which is 15.33%, which can also be seen in Table 1. 


Model Accuracy Model Loss 
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Figure 5. Effect of multi networks compare single networks from the performance of, (a) Accuracy and (b) Loss 
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Figure 6. Effect of spatial-temporal using 2D CNN compare 1D CNN from the performance of, (a) Accuracy 
and (b) Loss 


Moreover, the details in Table 1 and Figure 7 illustrate that the role of the Wavelet filter is the most 
important in identifying EEG signals, and the use of BCI is no exception. Figure 7(a) shows accuracy in 300 
epochs, and Figure 7(b) shows the Loss value. The performance is much improved with the Wavelet filter, 
even though multiple 2D CNNs are used. Performance is determined primarily by selecting the right features 
in machine learning. It is also seen that Adam remains consistent in performance with any configuration. 


Model Accuracy Model Loss 
10 15 
os 10 
g 3 
Fe) s 
% 05 
: + ~ . + . + r 1 , 1 r : x 
0 SO 100 150 200 250 0 ° 50 100 150 200 250 300 
epoch 
epoch 
— Wavelet-Adam — WeveletAdaDeta 0 Waevelet-SGD WaveletAdam ow Wavelet-AdaDelta Wavelet-SGD 
Non Wavelet-Adam — Non Wavelet-Adadelta --- Non Wavelet-SGD —— Non Wavelet-Adam ---- Non Wavelet-Adadelta — Non Wavelet-SGD 
(a) (b) 


Figure 7. Effect of Wavelet Filter in BCI with multiple 2D CNN from the performance of, (a) accuracy and 
(b) loss 


The following experiment compares 2D-CNN with recurrent neural networks (RNN) between 
multiple and single networks, as shown in Table 2. The other configurations are set the same. It was found 
that 2D CNN has a better performance of accuracy and loss value for both multiple and single networks. 
Interestingly, the performance degradation of single toward multiple networks is more extreme when using 
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RNN, from 90.57% to 71.56%. While using 2D CNN gave decreasing from 92.37 (multiple) to 83.76% 
(single) only. Likewise, the use of wavelets from both methods was tested. Moreover, when multiple 
networks of RNN were used, AdaDelta gave a low accuracy of 60.05%. This phenomenon shows that 2D 


CNN is more robust than RNN, providing higher performance for signal unfiltered, and single networks are 
lower than RNN. 


Table 2. The performance of 2D CNN toward RNN 
Wavelet Optimizing Technique Accuracy (%) Loss 
Multiple Networks Single Networks Multiple Networks Single Networks 
2D CNN =RNN = 2DCNN _RNN __ 2DCNN RNN 2D CNN _ RNN 


Adam 94.62 90.57 85.44 71.56 0.110 0.369 0.848 2.012 
With AdaDelta 92.37 60.05 83.76 - 0.167 0.742 0.905 - 
SGD 90.89 81.39 80.11 - 0.205 0.177 0.877 
Without Adam 13575 68.73 74.04 - 1.120 1.322 1.356 


The BCI configuration of multivariable motor imagery and emotion with multiple 2D CNNs was 
compared with previous studies for the same data, as in Table 3 for the Adam model. It can be seen that with 
a deeper convolution, namely the VGG16 architecture, the study using single networks and 1D CNN only 
provided 87% accuracy with a learning time of up to five minutes. In comparison, multiple 1D CNN 
networks tested in this study provided an accuracy of only 85.44%, while multiple networks with 2D CNN 
provided an accuracy of 94.62%. VGG 16 has 13 convolution layers and three Pooling. Of course, it provides 
deeper learning, so it is reasonable to give high accuracy, but it takes much longer. In this study, it is 
necessary to VGG16 considering that the CNN input is 3,456 points for each data set due to all variables and 
all channels being combined. While this study requires only two convolution layers and Pooling provides a 
much shorter training time, its accuracy is 85.44% for 1D CNN and 94.62% for 2D CNN. However, deep 
architectures like VGG 16 were not tested. 


Table 3. Comparison with previous works 


Methods Accuracy (%) Learning time (minutes) 
Architecture Architecture Architecture Architecture 
2Layers 8 Layers VGG16 2 Layers 8 Layers VGG16 
Single Networks-1D [25] - 84.73 87.09 - 4 47 
Multiple Networks-1D (proposed methods) 82.04 - - 0.009 - - 
Multiple Networks RNN [28] 90.57 1.450 
Multiple Networks-2D (proposed methods) 94.62 - - 0.001 - - 


4. CONCLUSION 

Using emotion variables from EEG signals that induce motor imagery in the brain-computer 
interface provides future development opportunities. The development of parallel computing, such as 
multiple networks, provides a significant performance increase of 15.39%. Meanwhile, the amount of 
information from many channels from EEG signals needs to be represented in the proper architecture. 
Therefore, multiple 2D CNN in BCI provides an accuracy of 94.62%, or 10.74 greater than 1D CNN. 
Meanwhile, another critical factor is the use of filters for signal extraction. Wavelet is the right choice so that 
it provides an increase in accuracy of up to 28.29%. The chosen architecture and configuration can be applied 
to other BCI applications or EEG signals in general. Identification of brain commands through EEG signals 
is exciting and endless research. Although it can be done by considering a single variable, the use of multi 
variables is inevitable. Emotion or concentration can induce motor imagery when imagining commands. In 
future work, it is necessary to consider 2D CNN in combination with either parallel or serial RNN. 
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